Introduction to Spatial Analysis

Day 1 - Concepts and Datasets

Jonathan Phillips

January, 2019

Geography

  • What is your favourite sport?
  • Do you speak Spanish?
  • Do you know who Fofão is?
  • How many kisses on the cheek do you greet someone with?
  • If you are on your own in a taxi do you sit in the front or back?
  • Do you think government policy should allow free migration?
  • Where do you live?

Geography

  • Knowledge and communication depend on where we live

  • Social norms and customs depend on where we live

  • Political preferences depend on where we live

Geography

Tobler’s First Law of Geography:

“Everything is related to everything else, but near things are more related than distant things”

Geography

  • What does ‘near’ mean?

  • Concepts of distance:
    • Euclidean
    • Great Circle
    • Manhattan
    • Levensthein
    • Mahalanobis
    • Driving
    • Network
    • Minimum-cost
    • Genetics

Geography

  • What does ‘related’ mean?
    • Correlated
    • More similar
    • More different (ex. dialing codes to avoid typing errors)



  • ‘Related’ does not mean one person ‘causes’ a similar effect on another
    • It may just be a common response to a similar environment
    • But interactions and spillovers are common

Geography

  • Locations of ‘Events’ could be ‘near’ to each other

Geography

  • Or characteristics of locations could be ‘near’ to each other

Geography

  • Multiple characteristics could also be ‘near’ to each other

Geography

Geography

Geography

  • But isn’t the world getting smaller?
    • ‘The death of distance’
    • Everything is ‘near’ on the internet
  • Relevant distances may be changing
    • Cost of flights instead of kilometres or hours
    • Language and social network instead of proximity to radio tower
  • Spatial relationships take place at multiple scales
    • I am Welsh, British, European etc.
    • The similarities between rural China and rural Russia are greater than the differences

Geography

  • Lots of interesting questions are really non-spatial
    • We can draw maps of them
    • But the conclusion does not depend on the locations of the units
Non-spatial Question Spatial Question
Which state in Brazil is richest? (DF) Where in Brazil are states richest? (Southeast)
How many countries have had cases of ebola? (11) Which part of Africa was affected by ebola (West and Central)?
What is the population of the USA? (~325m) How many people live West of the Mississippi? (~136m)

Geography

[****]

  • Geography is more than just ‘clustering’

  • A Typology of Spatial Relationships
    1. Clustering
    2. Natural resources/barriers
    3. Administrative barriers

Geography

  • Physical features also affect social and political processes
    • Attracting economic activity
    • Preventing interactions

Geography

Geography

Geography

Merits of Spatial Analysis

Opportunities:

  • Deeper explanations for common outcomes
  • Where helps us understand why
  • Avoid confounding relationships
  • Enabling new inferential methodologies

Limitations:

  • Data are not ‘independent’ for statistical analysis
  • Data are often aggregated, and the level of aggregation affects our conclusions (Modifiable Areal Unit Problem, Ecological Fallacy)
  • Distances of complex shapes are not ‘fixed’ (fractals)

Merits of Spatial Analysis

Map Literacy

  • Maps are clear and convincing
    • Patterns may only be visible when arranged spatially
    • If you have spatial data, why put it in a table or a chart?

Map Literacy

% latex table generated in R 3.5.1 by xtable 1.8-2 package % Sun Jan 13 12:51:21 2019

Map Literacy

Map Literacy

Map Literacy

  • But maps still require careful interpretation

Map Literacy

  • Scale
    • Can I walk from The Art Institute of Chicago to Union Station in 10 minutes?

Map Literacy

  • Scale
    • Can I walk from The Art Institute of Chicago to Union Station in 10 minutes?

Map Literacy

  • Compass
    • What’s the best place to view the sunset in the Wirral (UK)?

Map Literacy

  • Compass
    • What’s the best place to view the sunset in the Wirral (UK)?

Map Literacy

  • Legend
  • Can be manipulated to convey relevant (or misleading!) conclusions

Map Literacy

  • Choosing the Indicator
    • The most important! What precisley do we want to convey?

Map Literacy

  • Choosing the Indicator
    • The most important! What precisley do we want to convey?

Map Literacy

  • Mapping values to colours

Map Literacy

  • Mapping values to colours
    • Hard: Chosing break points between categories
    • Equal intervals, quantiles, standard deviations, ‘natural’ breaks

Map Literacy

  • Mapping values to colours
    • Hard: Chosing break points between categories
    • Equal intervals, quantiles, standard deviations, ‘natural’ breaks

Map Literacy

  • Mapping values to colours

Map Literacy

  • Mapping values to colours

Map Literacy

  • Mapping values to colours

Geographic Information Systems

  1. Convert the real world into a digital model
    • Necessarily simplified
  2. Compare multiple spatial layers

  3. Create measures and statistics to describe spatial relationships

Vector vs. Raster Data

  • Vector
    • Start with a blank page
    • Add specific objects (points, lines, polygons) defined by coordinates (x,y)
    • The computer stores just the coordinates of the objects
    • Non-spatial ‘Attributes’ of each object allow complex analyses
  • Raster
    • Start with a grid
    • Each grid square (pixel) has a value
    • The computer stores one value for every grid square (fixed memory size)
    • Mostly for ‘continuous’ remote sensing (satellite) images

Vector vs. Raster Data

Types of Vector Data

Type Dimensions
Point 0
Line 1
Polygon 2
  • An analysis choice, and depends on scale

Types of Vector Data

  • The attributes we assign to vector objects also vary

Locations in Space

  • How do we describe the location of an object in space?

  • The real world is 3-dimensional
    • Mostly we deal with points on the earth’s surface
    • This is not a problem for computers that can create ‘virtual earths’
  • Geographic Coordinate Systems
    • ‘Perfect’ representations of earth in the computer
    • Longitude and Latitude define any point on earth

Locations in Space

  • Longitude = Angle from equator (N/S)
  • Latitude = Angle from Greenwich, London (E/W)

Locations in Space

  • Longitude & Latitude can be measured in different units
    • DMS: 49°30’00″N, 123°30’00″W
    • DM: 49°30.0′, -123°30.0’
    • Decimal Degrees: 49.5000°,-123.5000°
  • But all of these use the same Geographic Coordinate System
    • And we ‘always’ use the same one
    • WGS-84

Locations in Space

  • What shape is the earth?
    • An ‘oblate spheroid’

  • This oblate spheroid is estimated by a ‘datum’ so we get the location correct
    • No need to worry about this, WGS-84 includes its own datum

Locations in Space

  • But we view maps on flat surfaces: paper or screens
    • Try peeling an orange
  • To produce flat maps we need a Projected Coordinate Reference System
    • Translating 3-D locations to 2-D locations
    • There are many different ways to do this, just as there are many ways to peel an orange

Locations in Space

  • Projections can preserve shape, area or distance, but not all three!

Locations in Space

  • Projections are less distorted if they are localized to one part of the earth
    • So we choose a projection based on the extent of our analysis/map
  • Ex. UTM (Universal Transverse Mercator) Zone

  • Use http://epsg.io/ to find appropriate local projections

Locations in Space

Locations in Space

Coordinate Reference Systems have useful shortcut EPSG codes - In R, this is all you need

Coordinate Ref. System Type EPSG Code
WGS-84 Geographic 4326
Corrego Alegre / UTM zone 23S (Coastal Brazil) Projected 22523
Chua / UTM zone 23S (Distrito Federal) Projected 4071

Locations in Space

  • Which Coordinate Reference System (CRS) should I use?
    • Important: You don’t choose - your data sources already come with a specific CRS
    • Important: ALL data in the analysis must use the same CRS
    • That means sometimes we have to transform from one coordinate system to another
    • For projections, do you want to convey shape, area or distance accurately?
  • For distance, what units do you want to use?
    • Geographic: Degrees
    • Projected: Meters (usually)

Georeferencing

  • With a CRS, computers understand locations such as -23.562778, -46.725261
  • But what if we have a street address?

Georeferencing

  • We can also take an image and georeference it to a map

  • We need to ‘pin’ the map to at least two points

Spatial Datasets

  • Non-spatial datasets are just tables

  • Spatial datasets just add location data to the table

Spatial Datasets

Spatial Datasets

  • Vector Spatial Datasets
    • Coordinates for every object
    • Multiple coordinates for lines, polygons
Code Name Location
001 Minas Gerais -48.77246, -17.773988
002 Rio de Janeiro -49.24686, -16.819800

Spatial Datasets

  • Vector Spatial Datasets
    • Coordinates for every object
    • Multiple coordinates for lines, polygons
Code Name Location
001 Minas Gerais MULTIPOLYGON ((( -48.77246 -17.773988, -48.77252 -17.773970, -48.77266 -17.773990)))
002 Rio de Janeiro MULTIPOLYGON ((( -49.24686 -16.819800, -49.24701 -16.819812, -49.24707 -16.819838)))

Spatial Datasets

  • One single ‘Multipolygon’ can be complicated
    • Comprised of many distinct polygons
    • Polygons can have ‘holes’ in them

Spatial Datasets

  • Raster Spatial Datasets
    • Coordinates for every data point
x y value
-106.05 35.96 0
-106.06 35.96 13
-105.07 35.96 2
-105.08 35.96 0

Spatial Datasets

  • Historically, vector data has been stored as shapefiles
    • Shapefiles separate out the tables, location data, projection into separate files
File Contains
Data.shp Geometry details
Data.dbf Non-spatial attribute data (a table)
Data.shx Indexing of the geometry to match the table
Data.prj Details of the projection

Spatial Datasets

  • Raster data is typically stored as .tiff files
    • The same as you get from a camera or scanner
    • But with location and projection data so that we know ‘where’ the image corresponds to

Non-Spatial Joins

  • Most of our data is non-spatial, but could be made spatial
    • Election results
    • Death rates
    • Welfare payments
    • Conflict
  • We can make this data spatial if we link it to existing spatial (location) data
    • Using common identifiers in both datasets
    • Non-spatial joins

Non-Spatial Joins

  • Governments publish school performance data
    • But what is the spatial pattern of school performance?
    • Better in the city centre or in the suburbs?
  • We need a source for the location of the schools
    • Perhaps from a separate geographical survey
    • Or by georeferencing their addresses
  • How do we combine the school performance and location datasets?
    • By code
    • By name?

Non-Spatial Joins

temp

  • examples of types of spatial analysis